Search CORE

47 research outputs found

A quick search method for audio signals based on a piecewise linear representation of feature trajectories

Author: Kashino Kunio
Kimura Akisato
Kurozumi Takayuki
Murase Hiroshi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/10/2007
Field of study

This paper presents a new method for a quick similarity-based search through long unlabeled audio streams to detect and locate audio clips provided by users. The method involves feature-dimension reduction based on a piecewise linear representation of a sequential feature trajectory extracted from a long audio stream. Two techniques enable us to obtain a piecewise linear representation: the dynamic segmentation of feature trajectories and the segment-based Karhunen-L\'{o}eve (KL) transform. The proposed search method guarantees the same search results as the search method without the proposed feature-dimension reduction method in principle. Experiment results indicate significant improvements in search speed. For example the proposed method reduced the total search time to approximately 1/12 that of previous methods and detected queries in approximately 0.3 seconds from a 200-hour audio database.Comment: 20 pages, to appear in IEEE Transactions on Audio, Speech and Language Processin

arXiv.org e-Print Archive

Crossref

Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation

Author: Harada Noboru
Kashino Kunio
Niizumi Daisuke
Ohishi Yasunori
Takeuchi Daiki
Publication venue
Publication date: 03/06/2023
Field of study

Self-supervised learning general-purpose audio representations have demonstrated high performance in a variety of tasks. Although they can be optimized for application by fine-tuning, even higher performance can be expected if they can be specialized to pre-train for an application. This paper explores the challenges and solutions in specializing general-purpose audio representations for a specific application using speech, a highly demanding field, as an example. We enhance Masked Modeling Duo (M2D), a general-purpose model, to close the performance gap with state-of-the-art (SOTA) speech models. To do so, we propose a new task, denoising distillation, to learn from fine-grained clustered features, and M2D for Speech (M2D-S), which jointly learns the denoising distillation task and M2D masked prediction task. Experimental results show that M2D-S performs comparably to or outperforms SOTA speech models on the SUPERB benchmark, demonstrating that M2D can specialize in a demanding field. Our code is available at: https://github.com/nttcslab/m2d/tree/master/speechComment: Interspeech 2023; 5 pages, 2 figures, 6 tables, Code: https://github.com/nttcslab/m2d/tree/master/speec

arXiv.org e-Print Archive

INTER-TRIAL DIFFERENCE ANALYSIS THROUGH APPEARANCE-BASED MOTION TRACKING

Author: Kadota Koji
Kashino Makio
Kasino Kunio
Kimura Toshitaka
Mikami Dan
Publication venue: International Society of Biomechanics in Sports (ISBS)
Publication date: 07/08/2012
Field of study

The purpose of this study is to develop a method for quantitative evaluation and visualization of inter-trial differences in the motion of athletes. Previous methods for kinematic analyses of human movement have required attaching specific equipment to a body segment or can only be used in an environment designed for analyses. Therefore, they are difficult to use for observing motions in real games. To enhance the applicability to real-game situations, we propose appearance-based motion tracking. Our method only requires an image sequence from a camera. From the image sequence, automatic detection of trials and a difference analysis of them are conducted. We applied our method to the analysis of pitching motions in actual baseball games. Though we have no quantitative evaluations yet, the experimental results imply the efficacy of our method

ISBS (International Society of Biomechanics in Sports): Conference Proceedings Archive

Deep Attentive Time Warping

Author: Atarsaikhan Gantugs
Iwana Brian Kenji
Kashino Kunio
Kimura Akisato
Matsuo Shinnosuke
Uchida Seiichi
Wu Xiaomeng
Publication venue
Publication date: 13/09/2023
Field of study

Similarity measures for time series are important problems for time series classification. To handle the nonlinear time distortions, Dynamic Time Warping (DTW) has been widely used. However, DTW is not learnable and suffers from a trade-off between robustness against time distortion and discriminative power. In this paper, we propose a neural network model for task-adaptive time warping. Specifically, we use the attention model, called the bipartite attention model, to develop an explicit time warping mechanism with greater distortion invariance. Unlike other learnable models using DTW for warping, our model predicts all local correspondences between two time series and is trained based on metric learning, which enables it to learn the optimal data-dependent warping for the target task. We also propose to induce pre-training of our model by DTW to improve the discriminative power. Extensive experiments demonstrate the superior effectiveness of our model over DTW and its state-of-the-art performance in online signature verification.Comment: Accepted at Pattern Recognitio

arXiv.org e-Print Archive